All Questions
Tagged with anomaly-detectionscikit-learn
31 questions
4votes
1answer
54views
Unsupervised Isolation Forrest sklearn hyperparameters
I am using sklearn's IsolationForest for unsupervised anomaly detection task. According to the docs, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html, there are ...
3votes
1answer
37views
Confirm understanding of decision_function in Isolation Forest sklearn
I am looking to better understand sklearn IsolationForest decision_function. My understanding is that if the metric is closer to -1 then the model is more confident ...
2votes
0answers
36views
Determine best hyperprameteres in GridSearch - Isolation Forest
I have implemented an Isolation Forest algorithm for anomaly detection (unsupervised learning), where I divided my dataset into 1000 subsets, and for each subset, there is one isolation tree. This ...
4votes
2answers
177views
Loss function in Isolation Forest
I have recently came across on this algorithm and was working on my graduation project. As per my understanding, we creates sub trees for each sub samples. Then we calculates the scores for each ...
0votes
0answers
81views
Confused with Isolation Forest
Let say, I have the anomaly detection (unsupervised learning) dataset with 10 observations (two features). The datasets is like below: After executing the model, following are the results (anomalies ...
0votes
1answer
75views
detecting abnormality in a specific feature with respect to others (unsupervised?)
I have a large dataset with a feature y which is dependent in part on features x1 and x2. All features are noisy, and y is also dependent on other parameters not captured in the dataset. I would like ...
0votes
0answers
111views
Understanding Isolation Forest predictions
I'm running sklearn's IsolationForest on a dataset containing 2 classes of data, one that I know is the anomaly (~1.5% of the entire dataset), the other is the normal dataset. I'm using this (shuffled)...
1vote
1answer
411views
regarding computing the centroid of high dimensional data
In scikit-learn, or other python libraries, are there any existing implementations to compute centroid for high dimensional data sets?
0votes
2answers
2kviews
Anomaly (Outlier) Detection with Isolation Forest too sensitive even with low contamination
I'm trying to use the sklearn implementation of the Isolation Forest algorithm to detect anomalies in my time series data. However, even with a very low contamination parameter (0.0001), it is ...
3votes
1answer
283views
Geolocation Based Anomaly Detection in IPs Using Isolation Forest
I'm trying to detect anomalies based on geolocation from IP addresses on a server access log file. I have created two features country and geo_velocity, using the IP address and the timestamp of each ...
1vote
2answers
1kviews
Cross-Validation in Anomaly Detection with Labelled Data
I am working on a project where I train anomaly detection algorithms Isolation Forest and Auto-Encoder. My data is labelled so I have the ground truth but the nature of the problem requires ...
5votes
1answer
3kviews
Interpretation of scikit-learn one class svm scores
How can I interpret the scores generated by the function score_samples(X) from a scikit-learn OneClassSVM model? Is there a way ...
2votes
2answers
5kviews
What does the classification report interpret? Class 1 indicates abnormal data
How to interpret the report and How is precision, recall values are calculated for individual class labels. What is the significance of macro avg ? Does this report signify a good predictions by the ...
1vote
0answers
116views
Custom Decision Function for Custom Outlier Detection Algorithm
I have built a custom algorithm for semi-supervised anomaly detection and here is my output example as following with probability threshold set to 0.05 and 1 = outlier, 0 = inlier: ...
1vote
1answer
57views
How do I evaluate a K-Means unsupervised anomaly detection approach?
how do I evaluate K-means clustering anomaly detection method as there is no labelled data of anomaly class. To find the cluster (K), I have used the silhouette score from Scikit learn library. Scikit ...